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Amendments to the Claims: 

This listing of claims will replace all prior versions, and listings, of claims in the 
application: 

Listing of Claims: 

1 . (currently amended) A method for balancing the load of a parallel processing system 
having a plurality of parallel processing elements arranged in a loop, wherein each 
processing element has a local number of tasks associated therewith, wherein r represents 
the number for a selected processing element PE r , and wherein each of said processing 
elements are operable to communicate with a clockwise adjacent processing element and 
with an anti-clockwise adjacent processing element, the method comprising: 

determining within each of said processing elements a total number of tasks present 
within said loop; 

calculating a local mean number of tasks within each of said plurality of processing 
elements; 

calculating a local deviation from said local mean number within each of said 
plurality of processing elements; 

determining a sum weighted deviation from said local deviations within each of said 
processing elements for one-half of said loop in an anti-clockwise direction, said one- 
half of said loop being relative to each of said selected processing elements; 

determining a sum weighted deviation from said local deviations within each of said 
processing elements in one-half of said loop in a clockwise direction, said one-half of 
said loop being relative to each of said selected processing element; 

determining a clockwise transfer parameter and an anti-clockwise transfer parameter 
from said sum weighted deviations within each of said processing elements; and 

redistributing tasks among said plurality of processing elements in response to said 
clockwise transfer parameters and said anti-clockwise parameters within each of said 
plurality of processing elements. 
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2. (original) The method of claim 1 wherein said determining within each of said 
processing elements a total number of tasks present within said loop, comprises: 

transmitting said local number of tasks associated with each of said processing 
elements to each other of said plurality of processing elements within said loop; 

receiving within each of said processing elements said number of local tasks 
associated with said each other of said plurality of processing elements; and 

summing said number of local tasks associated with each of said processing elements 
with said number of local tasks associated with each other of said plurality of processing 
elements. 

3. (original) The method of claim 1 wherein said determining said total number of tasks 
present within said loop includes solving the equation V = ^ v / r , where V represents said 

total number of tasks, 2N represents the number of processing elements in said loop, and 
v/ represents said local number of tasks associated with an i th processing element in said 
loop. 

4. (currently amended) The method of claim 1 wherein said calculating a local mean 
number of tasks within each of said plurality of processing elements (PE r ) includes 
solving the equation M r = Trunc((V + E r )/2N) , where M r is said local mean for PE r , 
where 2N is the total number of processing elements in said loop, and where E r is a 
number in the range of 0 to (2N-1) , Kis the total number of tasks, and wherein each 
processing element has a different E r value. 

5. (currently amended) The method of claim 4 wherein E+ controls said Trunc function is 
responsive to the value of E r such that said total number of tasks (V) for said loop is equal 
to the sum of the local mean number of tasks (M r ) for each of said plurality of processing 

i=N-\ 

elements in said loop (i. e ., V ■ ^M,- ) , 

i=-N 
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6. (currently amended) The method of claim 4 wherein said local mean 

M r = Trunc({V + E r )/2N) for each local PE r within said loop is equal to either on e of X 
and or (X+l). 

7. (original) The method of claim 1 wherein said calculating a local deviation within each 
of said plurality of processing elements comprises finding the difference between said 
local number of tasks and said local mean number for each of said plurality of processing 
elements. 

8. (original) The method of claim 1 wherein said determining a sum weighted deviation 
within each of said processing elements for one-half of said loop in an anti-clockwise 
direction comprises: 

assigning a weight to each other of said plurality of processing elements within said 
loop; 

transmitting said local deviation and said weight associated with each of said 
processing elements half way around said loop in an anti-clockwise direction, said one- 
half of said loop being relative to each of said selected processing elements; 

receiving said local deviation and said weight associated with each other of said 
processing elements half way around said loop in a clockwise direction, said one-half of 
said loop being relative to each of said selected processing elements; and 

summing the product of said local deviation and said weight associated with each 
other of said processing elements half way around said loop in a clockwise direction. 

9. (original) The method of claim 1 wherein said determining a sum weighted deviation 
within each of said processing elements in one-half of said loop in a clockwise direction 
comprises: 

assigning a weight to each other of said plurality of processing elements within said 
loop; 
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transmitting said local deviation and said weight associated with each of said 
processing elements half way around said loop in an clockwise direction, said one-half of 
said loop being relative to each of said selected processing elements; 

receiving said local deviation and said weight associated with each other of said 
processing elements half way around said loop in a anti-clockwise direction, said one- 
half of said loop being relative to each of said selected processing elements; and 

summing the product of said local deviation and said weight associated with each 
other of said processing elements half way around said loop in a anti-clockwise direction. 

10. (currently amended) The method of claim 1 wherein said determining a clockwise 
transfer parameter and an anti-clockwise transfer parameter within each of said 
processing elements comprises: 

setting T a = {SI A) - A; and 

setting T c = {SI A) + A, where T c represents said clockwise transfer parameter, T a 
represents said anti-clockwise transfer parameter, A = {A - C)/4N, A represents the sum 
weighted deviation within each of said processing elements in one-half of said loop in an 
anti-clockwise direction, C represents sum weighted deviation within each of said 
processing elements in one-half of said loop in a clockwise direction, S represents said 
local deviation, and N represents the number of PEs on the loop. 

1 1 . (currently amended) The method of claim 1 wherein said determining a clockwise 
transfer parameter and an anti-clockwise transfer parameter within each of said 
processing elements comprises at least one of: 

setting T c = Trunc[{2S + A) - 4] and T a = S - T c \ and 
setting the T a = Trunc[{2S - A) + 4] and T c = S - T a \ 

where T c represents said clockwise transfer parameter, where T a represents said anti- 
clockwise transfer parameter , wh e r e A - Mag, if A > Mag, wh e r e A - — Mag, if A < 
Mag, where Mag = abs(25), and where S represents the local deviation of a selected 
processing element , where A represents the number of tasks passing through the current 
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processing element, whereby if A > Mag then set A equal to Mag and if A < -Mag, then 
set A equal to -Mag . 

12. (currently amended) A method for reassigning tasks among an odd numbered plurality of 
processing elements within a parallel processing system, said processing elements being 
connected in a loop and each having a local number of tasks associated therewith, the 
method comprising: 

determining a total number of tasks on said loop; 

computing a local mean value for a selected processing element; 

computing a local deviation for said selected processing element, said local deviation 
representative of the difference between said local number of tasks for said selected 
processing element and said local mean value for said selected processing element; 

inserting a phantom processing element within said loop having a local deviation of 
zero when the loop is comprised of an odd number of processing elements ; 

assigning a weight to each of said plurality of processing elements; 

summing a weighted deviation from said local deviations of said processing elements 
located within one-half of the loop in an anti-clockwise direction relative to said selected 
processing element; 

summing said weighted deviation from said local deviations of said processing 
elements located within one-half of the loop in a clockwise direction relative to said 
selected processing element; 

computing a number of tasks to transfer in a clockwise direction for said selected 
processing element from said sum weighted deviations ; 

computing a number of tasks to transfer in an anti-clockwise direction for said 
selected processing element from said sum weighted deviations ; and 

reassigning tasks relative to the said number of tasks to transfer in a clockwise 
direction and said number of task to transfer in an anti-clockwise direction. 

13. (original) The method of claim 12 wherein said determining the total number of tasks on 
said loop, comprises: 
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transmitting said local number of tasks associated with each of said processing 
elements to each other of said plurality of processing elements within said loop; 

receiving within each of said processing elements said number of local tasks 
associated with said each other of said plurality of processing elements; and 

summing said number of local tasks associated with each of said processing elements 
with said number of local tasks associated with each other of said plurality of processing 
elements. 

14. (currently amended) The method of claim 12 wherein computing a local mean value for 
a selected processing element includes solving the equation M r = Trunc((V + E r )l2N) , 
where M r is said local mean for a selected processing element PEr, 2N is the total number 
of processing elements in said loop, and E r is a number in the range of 0 to (2N-l \ Vis 
the total number of tasks, and wherein each processing element has a different E r value. 

15. (currently amended) The method of claim 14 wherein controls said Trunc function is 
responsive to the value of E r such that said total number of tasks (V) for said loop is equal 
to the sum of the local mean number of tasks (M r ) for each of said plurality of processing 

i=2N-\ 

elements in said loop (i. e ., V « ) • 

»=o 

16. (currently amended) The method of claim 12 wherein said inserting a phantom 
processing element within said loop further comprises: 

locating said phantom processing element in a position within said loop that is 
diametrically opposed to said processing elementf-and 

assigning a zero d e viation valu e to s aid phantom processing element . 

17. (original) The method of claim 12 wherein said assigning a weight to each of said 
plurality of processing elements includes assigning a weight dependent upon each of said 
processing element's location to said selected processing element. 



PII-1166671vl 



Page 8 of 21 



Appl.No. 10/689,336 

'Amdt. dated 31 October 2007 

Reply to Office action dated 3 1 July 2007 

18. (original) The method of claim 12 wherein said computing a local mean value for a 
selected processing element, said computing a local deviation for said selected processing 
element, said inserting a phantom processing element within said loop, said assigning a 
weight to each of said plurality of processing elements, said summing said weighted 
deviation of said processing elements located within one-half of the loop in an anti- 
clockwise direction, summing said weighted deviation of said processing elements 
located within one-half of the loop in a clockwise direction, computing a number of tasks 
to transfer in a clockwise direction for said selected processing element, computing a 
number of tasks to transfer in an anti-clockwise direction for said selected processing 
element, and reassigning tasks relative to the said number of tasks to transfer in a 
clockwise direction and said number of tasks to transfer in an anti-clockwise direction are 
completed simultaneously for each of said plurality of processing elements within said 
loop. 

19. (original) The method of claim 12 wherein said summing said weighted deviation of said 
processing elements located within one-half of the loop in an anti-clockwise direction 
relative to said selected processing element comprises: 

transmitting said local weighted deviation associated with each of said processing 
elements half way around said loop in an anti-clockwise direction, said one-half of said 
loop being relative to each of said selected processing elements; 

receiving said local weighted deviation associated with each other of said processing 
elements half way around said loop in a clockwise direction, said one-half of said loop 
being relative to each of said selected processing elements; and 

summing said local weighted deviations associated with each other of said processing 
t elements half way around said loop in a clockwise direction. 

20. (original) The method of claim 12 wherein summing said weighted deviation of said 
processing elements located within one-half of the loop in a clockwise direction relative 
to said selected processing element comprises: 
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transmitting said local weighted deviation associated with each of said processing 
elements half way around said loop in an clockwise direction, said one-half of said loop 
being relative to each of said selected processing elements; 

receiving said local weighted deviation associated with each other of said processing 
elements half way around said loop in a anti-clockwise direction, said one-half of said 
loop being relative to each of said selected processing elements; and 

summing said local weighted deviations associated with each other of said processing 
elements half way around said loop in an anti-clockwise direction. 

21 . (currently amended) A computer readable memory device carrying a set of instructions 
which, when executed, perform a method comprising: 

determining within each of said processing elements a total number of tasks present 
within said loop; 

calculating a local mean number of tasks within each of said plurality of processing 
elements; 

calculating a local deviation from said local mean number within each of said 
plurality of processing elements; 

determining a sum weighted deviation from said local deviations within each of said 
processing elements for one-half of said loop in an anti-clockwise direction, said one- 
half of said loop being relative to each of said selected processing elements; 

determining a sum weighted deviation from said local deviations within each of said 
processing elements in one-half of said loop in a clockwise direction, said one-half of 
said loop being relative to each of said selected processing element; 

determining a clockwise transfer parameter and an anti-clockwise transfer parameter 
from said sum weighted deviations within each of said processing elements; and 

redistributing tasks among said plurality of processing elements in response to said 
clockwise transfer parameters and said anti-clockwise parameters within each of said 
plurality of processing elements. 
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